Part-of-speech Tagging of Code-Mixed Social Media Text
نویسندگان
چکیده
A common step in the processing of any text is the part-of-speech tagging of the input text. In this paper, we present an approach to tackle code-mixed text from three different languages Bengali, Hindi, and Tamil apart from English. Our system uses Conditional Random Field, a sequence learning method, which is useful to capture patterns of sequences containing code switching to tag each word with accurate part-of-speech information. We have used various pre-processing and post-processing modules to improve the performance of our system. The results were satisfactory, with a highest of 75.22% accuracy in Bengali-English mixed data. The methodology that we employed in the task can be used for any resource poor language. We adapted standard learning approaches that work well with scarce data. We have also ensured that the system is portable to different platforms and languages and can be deployed for real-time analysis.
منابع مشابه
POS Tagging of Hindi-English Code Mixed Text from Social Media: Some Machine Learning Experiments
We discuss Part-of-Speech(POS) tagging of Hindi-English Code-Mixed(CM) text from social media content. We propose extensions to the existing approaches, we also present a new feature set which addresses the transliteration problem inherent in social media. We achieve an 84% accuracy with the new feature set. We show that the context and joint modeling of language detection and POS tag layers do...
متن کاملSMPOST: Parts of Speech Tagger for Code-Mixed Indic Social Media Text
Use of social media has grown dramatically fast during the past few years. Users usually follow informal languages in communicating through social media. This language of communication is often mixed in nature, where people transcribe their regional language with English. This technique of writing is increasing its popularity rapidly. Natural language processing (NLP) aims to infer the informat...
متن کاملA CRF Based POS Tagger for Code-mixed Indian Social Media Text
In this work, we describe a conditional random fields (CRF) based system for Part-OfSpeech (POS) tagging of code-mixed Indian social media text as part of our participation in the tool contest on POS tagging for codemixed Indian social media text, held in conjunction with the 2016 International Conference on Natural Language Processing, IIT(BHU), India. We participated only in constrained mode ...
متن کاملRecurrent Neural Network based Part-of-Speech Tagger for Code-Mixed Social Media Text
This paper describes Centre for Development of Advanced Computing’s (CDACM) submission to the shared task’Tool Contest on POS tagging for CodeMixed Indian Social Media (Facebook, Twitter, and Whatsapp) Text’, collocated with ICON-2016. The shared task was to predict Part of Speech (POS) tag at word level for a given text. The codemixed text is generated mostly on social media by multilingual us...
متن کاملPart-of-Speech Tagging for Code-mixed Indian Social Media Text at ICON 2015
This paper discusses the experiments carried out by us at Jadavpur University as part of the participation in ICON 2015 task: POS Tagging for Code-mixed Indian Social Media Text. The tool that we have developed for the task is based on Trigram Hidden Markov Model that utilizes information from dictionary as well as some other word level features to enhance the observation probabilities of the k...
متن کاملPart-of-Speech Tagging for Code-Mixed English-Hindi Twitter and Facebook Chat Messages
The paper reports work on collecting and annotating code-mixed English-Hindi social media text (Twitter and Facebook messages), and experiments on automatic tagging of these corpora, using both a coarse-grained and a fine-grained part-ofspeech tag set. We compare the performance of a combination of language specific taggers to that of applying four machine learning algorithms to the task (Condi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016